Using Nature Language Processing to Improve Optical Character Recognition

نویسندگان

  • Lining Xu
  • Yongxu Wu
چکیده

OCR (Optical Character Recognition) has developed over 100 years. However, if the document or picture is stained, it could not work well. Considering that words in text can be connected by logical relationship, with the help of the idea that reducing the size of word stock which references from license plate recognition, this paper established N-GRAM model, used the results of Google search engine to improve its text sparsity. The use of residual features of the original stained characters can improve the recognition rate and accuracy with the help of a smaller size of the word stock successfully.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Postprocessing of Optical Character Recognition Based on Statistical Noisy Channel and Language Model

The techniques of image processing have been used in optical character recognition (OCR) for a long time. The recognition method evolved from early "pattern recognition" to "feature extraction" recently. The recognition rate is raised from 70% to 90%. But the character by character recognition technique has its limitation. Using language models to assist the OCR system in improving recognition ...

متن کامل

Blob Detection Technique Using Image Processing for Identification of Machine Printed Characters

Optical character recognition systems have been effectively developed for the recognition of printed characters. Optical character recognition is an awesome computer vision technique with various applications ranging from saving real time scripts digitally and deriving context based intelligence using natural language processing from the texts. One such application is the recognition of machine...

متن کامل

Classifier Fusion Method to Recognize Handwritten Kannada Numerals

Optical Character Recognition (OCR) is one of the important fields in image processing and pattern recognition domain. Handwritten character recognition has always been a challenging task. Only a little work can be traced towards the recognition of handwritten characters for the south Indian languages. Kannada is one such south Indian language which is also one of the official language of India...

متن کامل

OCR Post-Processing for Low Density Languages

We present a lexicon-free post-processing method for optical character recognition (OCR), implemented using weighted finite state machines. We evaluate the technique in a number of scenarios relevant for natural language processing, including creation of new OCR capabilities for low density languages, improvement of OCR performance for a native commercial system, acquisition of knowledge from a...

متن کامل

A Finite State Model for Urdu Nastalique Optical Character Recognition

Finite state technology is being used since long to model NLP (Natural Language Processing) applications specially it has very successfully applied to machine translation and speech recognition systems. Character recognition in cursive scripts or handwritten Latin script also have attracted researchers’ attention and some research is also done in this area. Optical character recognition is the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016